Yet Another Locater

2018/04/27, 2021/02/28

YAL finds occurrences within files, or directories of files, lines that meet sets of criteria, positive and negative.

Use

Doubleclick !YAL to put its icon on the iconbar.

To tell YAL what to search for, click Select on the iconbar icon. This opens a directory called Locate containing a text file called Search. This you need to edit in a text-editor, so that each line determines a search-criterion (see below).

After that you drag onto the iconbar-icon the file or directory in which to search and YAL will report its findings in a taskwindow, giving the linenumbers and lines that match the criteria (and the filename in the case of a directory). The same search criteria will be used for subsequent drags of files or directories. until you edit the Search file.

Clicking Adjust on the iconbar icon opens the configuration file in a text-editor. It determines the values:

  • validtype : a list determining the filetypes allowed in the search. The default is just textfiles.
  • validleaf : a pattern that the names of files to be searched must match. The default pattern matches anything.
  • recurse : a Boolean value determining whether the search recurses down subdirectories. The default value is true.
These values can be edited, and the configuration file saved.

Search files

Each line of the search file adds a criterion, which is negative if the line starts with the symbol ~. That is to say, lines that satisfy the criterion will be rejected. Only lines satisfying all the positive criteria and none of the negative will be displayed. The criteria may appear in any order and may be of two kinds: text or pattern. The syntax of patterns will be explained below.

A positive textual criterion is just literal text, not starting with ~, on a single line. The criterion is satisfied if the text appears as a substring anywhere in the line. It may be reported many times if it appears in different positions. A negative textual criterion is a line starting with ~ followed immediately after by a blank space. The remainder of the line, including blank spaces, determines the criterion. Lines will not be reported if they contain that remainder as a substring. Be careful about blank spaces.

A searchfile

Robert
~ Lady Jane
will report all lines mentioning Robert somewhere so long as they do not mention Lady Jane anywhere.

Pattern criteria begin the line with == for positive criteria and ~= for negative. What follows must be a pattern, preceded or followed by any number of blank spaces. Blank spaces cannot occur in patterns, so in this case they are simply ignored.

A searchfile containing

==          ^%s*Dear%s+Sir
==          appalling%.$
~=           Sir%s+James
~=           my%s+wife
for example, reports any line beginning with the phrase Dear Sir and ending with appalling. so long as it does not contain the phrase Sir James or my wife .

You can, of course, save your searchfiles for later use.

Why another locater?

I have quite a few locater applications, for searching through multiple files. Some work well, others not at all, perhaps because they need recompiling. Most are a lot slicker than YAL, with nice dialogues for the settings, some with throwback. None, however, offer the pattern-syntax that I am used to. Apart from the pattern-syntax YAL is a simple, almost minimal, application.

Most people probably only use word search with textual criteria. Until you are used to them, patterns may seem bothersome - another thing to be learned. But they are much more powerful, and they can repay the effort. In fact YAL translates textual criteria into pattern criteria.

Pattern-syntax

A pattern could be simply a word. But not a phrase, because blank spaces are not allowed in patterns. If you wanted to search for the phrase
                 did you say "another $2"?
you would use the pattern
                 did%syou%ssay%s"another%s%$2"%?
The expression %s is the pattern for a blank space. The so called magic characters,
                          ^ $ ( ) % . [ ] * + - ?
must be preceded by a percent (%) sign to match themselves in plain text. This is so that magic characters not preceded by % can have special meanings within patterns.
^ denotes the beginning of a line.
$ denotes the end of a line.
So to search for a line starting
                          From:
as one might find in an email, you would use the pattern
                         ^From:
The pattern . matches any single character (that includes a blank space character).

A character-class is a subset of the 256 possible ASCII characters. Here is a list of standard ones:

%a   all letters
     can also be written [A-Za-z]
%A   all non-letters
%c   all control characters
%C   all non-control characters
%d   all digits
     can also be written [0-9]
%D   all non-digits
%g   all printable characters except a space
%G   a space or any non-printable character
%l   all lower-case letters
     can also be written [a-z]
%L   anything not a lower-case letter
%p   all punctuation characters
%P   all non-punctuation characters
%s   all space characters
%S   all non-space characters
%u   all upper-case letters
     can also be written [A-Z]
%U   anything not an upper-case letter
%w   all alphanumeric characters
     can also be written [0-9A-Za-z]
%W   anything not an alphanumeric character
%x   any hexadecimal digit
     can also be written [0-9A-Fa-f]
%X   anything not a hexadecimal digit
%z   ASCII nul
Any other character preceded by a percent sign just represents itself.

Square brackets [...], within which the first character is not ^, represent the character-class which is the union of the classes and characters it contains. The complementary character-class is denoted by [^...]. So, for example, %S is the same as [^%s].

A character-class or character can be qualified by one of the following postfix operators to give a pattern

*   matches zero or more repetitions of characters in the class, greedily
    i.e  the longest sequence it can.
+   matches one or more repetitions of characters in the class, greedily
    i.e  the longest sequence it can.
-   matches the shortest sequence it can of zero or more repetitions of
    characters in the class.
?   matches at most one character in the class.
Thus
              tooth%s?some
matches only
              toothsome
and
             tooth some
A pattern can contain sub-patterns enclosed in parentheses (round brackets), which denote captures. When a match succeeds the substrings matched by the captures are available for use in the pattern as the expressions %1 , %2 , ... %9 . Parentheses are ordered according to the position of their opening parenthesis. As a special case the empty capture () yields the position in the string of the opening parethesis, a number.

For example

                   (['"])quoted%1
matches
                     'quoted'
or
                     "quoted"
The pattern denoted by %f followed by a character-class matches an empty string at a frontier position; i.e so that the next character belongs to the class but the previous one does not.

The pattern denoted by %b followed by two distinct characters, say x and y, matches balanced substrings starting with x and ending with y. So %b() matches balanced parentheses, %b[] balanced square brackets, %b{} balanced braces, and so on.

The patterns expressible here are those which can be matched without backtracking. It is the syntax used by Lua's standard string library.

References: Programming in Lua by Roberto Ierusalimschy http://www.lua.org/docs.html

Limitations

YAL only deals with text line by line. It cannot search for patterns that occur over multiple lines, nor can it use backtracking. Those would require use of more sophisticated pattern-matching, say parser-expression grammars. It leaves the searched texts strictly alone, and does no replacement.
G.C.Wraith